Ideas
Cluster of design performances;
Highly variable simulation variances;
[^2] Taken from class slides Introduction-S24.pdf.
We assume that the performance of high-fidelity simulations \(Y_{xi}\) for design \(x\) follows a normal distribution with unknown mean \(\theta_x\) and known variance \(\sigma_x^2\):
\[\begin{equation} Y_{xi} \mid \theta_x \sim \mathcal{N}(\theta_x, \sigma_x^2), \quad i = 1, \dots, l_x, \end{equation}\]
where \(l_x\) is the number of high-fidelity simulations/repetitions for design \(x\).
This assumption is very strong and may not always hold in practice.
Unrealistic in real-world scenarios:
In many practical applications, the true variances of the designs are unknown and need to be estimated from the observed simulation outputs (maybe due to the complexity added to already sophisticated model).
Assuming known variances overlooks the uncertainty associated with the variance estimation process.
Ignores the heterogeneity of variances across designs:
Different designs may exhibit different levels of variability due to factors such as complexity, sensitivity to input parameters, or inherent uncertainties in the simulation process.
Assuming a common known variance for all designs fails to capture the heterogeneity in the variance structure across the design space.
Impacts the accuracy of ranking and selection:
The R & S of the best design are based on the comparison of the sample means and variances of the designs.
Incorrect assumptions about the variances can distort the ranking and lead to the selection of inferior designs, compromising the overall optimization process.
We could try to estimate the sample variances from the data, which would add another level of complexity to the model.
What if we treat the variances \(\sigma_x^2\) as unknown and modeling them as random effects?
Mixed effects models, also known as hierarchical or multilevel models, are a powerful statistical framework for analyzing data with complex structures, such as repeated measures, nested designs, or clustered observations.
The general formulation of a linear mixed effects model is given by: \[ Y = \underbrace{X\beta}_{\text{fixed}} + \overbrace{Zb}^{\text{random}} + \epsilon \] where:
Random effects in mixed effects models capture the unobserved heterogeneity among individual units or groups within the population.
They allow for the estimation of unit-specific deviations from the population average, providing a more realistic representation of the variability in the data.
The estimation of random effects does introduce additional complexity in the model, requiring specialized estimation techniques such as restricted maximum likelihood (REML) or Bayesian methods.
Random effects modelings are popular in biomedical researches because of the ability to account for inter-subject variability and correlation among repeated measurements.
Instead of assuming known variances, we introduce random effects to capture the heterogeneity in the variances across designs.
To introduce random effects for the variances, we assume that the variances \(\sigma_x^2\) come from a common distribution up to unknown parameters. Let \(\sigma_x^2\) follow a distribution \(D\) with parameters \(\phi\):
\[ \sigma_x^2 \sim D(\phi) \]
If \(X \sim \text{Gamma}(\alpha, \beta)\), then \(1/X \sim \text{InvGamma}(\alpha, \beta)\).
The inverse-gamma distribution is widely used in Bayesian statistics as a prior distribution for variance parameters, particularly when working with models that assume normally distributed data (closed-form posterior).
The parameters \(\alpha\) and \(\beta\) are shape and scale parameters of the distribution. They can be interpreted as strength and scale of the prior belief about the variances.
inv_gamma_plot
we can extend the formulation of the high-fidelity simulation outputs as follows:
\[\begin{equation} Y_{xi} = \underbrace{\theta_x}_{\text{fixed}} + \overbrace{\varepsilon_{xi}}^{\text{random}}, \quad \varepsilon_{xi} \sim \mathcal{N}(0, \sigma_x^2), \end{equation}\]
where \(Y_{xi}\) is the \(i\)-th simulation output for design \(x\), \(\theta_x\) is the true performance of design \(x\), and \(\varepsilon_{xi}\) is the random error term following a normal distribution with mean 0 and design-specific variance \(\sigma_x^2\).
We assume that the variances of the high-fidelity simulations, \(\sigma_x^2\), follow an inverse-gamma distribution with unknown shape parameter \(\alpha\) and scale parameter \(\beta\):
\[\begin{equation} \sigma_x^2 \sim \text{InvGamma}(\alpha, \beta), \quad \text{for } x = 1, \ldots, k, \end{equation}\]
where \(\alpha > 0\) and \(\beta > 0\) are the unknown parameters to be estimated from the data.
The pdf of the inverse-gamma distribution is given by:
\[\begin{equation} f(\sigma_x^2 \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right), \end{equation}\]
where \(\Gamma(\cdot)\) is the gamma function.
Random effects modeling provides us with the flexibility to let the data help determine the variability in the simulation outputs.
Case 1: If the data suggests that the variances are similar across all \(k\) designs, then the pdf of the inverse-gamma distribution will be relatively concentrated around a single value, indicating low variability among the designs. The estimated parameters \(\alpha\) and \(\beta\) will be such that the distribution has a small spread and a well-defined peak.
Case 2: If the data suggests that the variances are highly different across designs, then the pdf of the inverse-gamma distribution will be more spread out and may have a heavy tail (allowing for high variances for some design \(x\)). The estimated parameters \(\alpha\) and \(\beta\) will be such that the distribution has a larger spread and accommodates a wider range of variances.”
We start by considering the likelihood of the observed data for a single design \(x\). Suppose we have \(l_x\) simulation outputs \(Y_x = (Y_{x1}, \ldots, Y_{xl_x})\) for design \(x\). Assuming that the simulation outputs are normally distributed with mean \(\theta_x\) and variance \(\sigma_x^2\), the likelihood function is given by:
\[\begin{equation} f(Y_x \mid \theta_x, \sigma_x^2) = \prod_{i=1}^{l_x} \frac{1}{\sqrt{2\pi\sigma_x^2}} \exp\left(-\frac{(Y_{xi} - \theta_x)^2}{2\sigma_x^2}\right). \end{equation}\]
We assume that the prior distribution of \(\sigma_x^2\) follows an inverse-gamma distribution with parameters \(\alpha\) and \(\beta\):
\[\begin{equation} f(\sigma_x^2 \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right). \end{equation}\]
To obtain the posterior distribution of \(\sigma_x^2\), we apply Bayes’ theorem, i.e., posterior \(\propto\) likelihood \(\times\) prior:
\[\begin{equation} f(\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta) \propto f(Y_x \mid \theta_x, \sigma_x^2) f(\sigma_x^2 \mid \alpha, \beta). \end{equation}\]
Substituting the likelihood and prior distributions, we have:
\[\begin{align} f(\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta) &\propto \left(\prod_{i=1}^{l_x} \frac{1}{\sqrt{2\pi\sigma_x^2}} \exp\left(-\frac{(Y_{xi} - \theta_x)^2}{2\sigma_x^2}\right)\right) \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right) \\ &\propto (\sigma_x^2)^{-\frac{l_x}{2}} \exp\left(-\frac{\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{2\sigma_x^2}\right) (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right) \\ &\propto (\sigma_x^2)^{-(\alpha + \frac{l_x}{2} + 1)} \exp\left(-\frac{\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{\sigma_x^2}\right). \end{align}\]
The posterior distribution of \(\sigma_x^2\) is proportional to the product of the likelihood and the prior, and we can recognize it as an inverse-gamma distribution with updated parameters:
\[\begin{equation} \sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta \sim \text{InvGamma}\left(\alpha + \frac{l_x}{2}, \beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2\right). \end{equation}\]
The posterior parameters have intuitive interpretations:
The shape parameter is updated to \(\alpha + \frac{l_x}{2}\), which incorporates the number of observed simulations \(l_x\) for design \(x\).
The scale parameter is updated to \(\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2\), which includes the sum of squared deviations of the observed simulations from the mean \(\theta_x\).
The posterior mean of \(\sigma_x^2\) can be computed as:
\[\begin{equation} \mathbb{E}[\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta] = \frac{\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{\alpha + \frac{l_x}{2} - 1}, \end{equation}\]
is a weighted average of the population (design landscape) information (captured by \(\alpha\) and \(\beta\)) and the observed data (represented by the sum of squared deviations \(\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2\)).
Frequentist vs. Bayesian:
Choices of \(\alpha\) and \(\beta\);
Starting values and prior distributions;
fixed and unknown vs. Bayesian estimation.
We have the complete data log-likelihood (known variances):
\[\begin{equation} \begin{split} \log L(Y, \theta, g \mid Z, \tau, \mu, \Sigma) = & \sum_{x = 1}^{k} \sum_{m = 1}^{M} Z_{xm} \left( \log \tau_m + \log \phi( (\theta_x, g_x) \mid \mu_m, \Sigma_m ) \right) \\ & + \sum_{x = 1}^{k} \sum_{i = 1}^{l_x} \log \phi(Y_{xi} \mid \theta_x, \sigma_x^2). \end{split} \end{equation}\]
The updated complete data log-likelihood, incorporating the random effects variances, becomes:
\[\begin{equation} \begin{split} \log L(Y, \theta, g, \sigma^2 \mid Z, \tau, \mu, \Sigma, \alpha, \beta) = & \sum_{x = 1}^{k} \sum_{m = 1}^{M} Z_{xm} \left( \log \tau_m + \log \phi( (\theta_x, g_x) \mid \mu_m, \Sigma_m ) \right) \\ & + \sum_{x = 1}^{k} \left[ \sum_{i = 1}^{l_x} \log \phi(Y_{xi} \mid \theta_x, \sigma_x^2) + \log f(\sigma_x^2 \mid \alpha, \beta) \right], \end{split} \end{equation}\]
EM algorithm can work seamlessly with the random effects variances. For the random effects variances, the M-step updates for \(\alpha\) and \(\beta\) can be obtained by solving the following equations:
\[\begin{equation} \frac{\partial Q}{\partial \alpha} = k \log \beta - k \psi(\alpha) - \sum_{x = 1}^{k} \log \sigma_x^2 - \sum_{x = 1}^{k} \frac{\beta}{\sigma_x^2} = 0, \end{equation}\]
\[\begin{equation} \frac{\partial Q}{\partial \beta} = \frac{k \alpha}{\beta} - \sum_{x = 1}^{k} \frac{1}{\sigma_x^2} = 0, \end{equation}\]
where \(Q\) is the expected complete data log-likelihood and \(\psi(\cdot)\) is the digamma function
\[\begin{equation} \psi(x) = \frac{d}{dx} \log \Gamma(x) = \frac{\Gamma'(x)}{\Gamma(x)} \end{equation}\]
And
\[\begin{equation} \mathbb{E}[\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta] = \frac{\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{\alpha + \frac{l_x}{2} - 1}, \end{equation}\]
E-step: Update the latent membership variables \(Z\) and the posterior means and variances of the random effects variances \(\sigma_x^2\) using current parameter estimates and the observed high-fidelity data \(Y_{xi}\).
M-step: Update the parameters \(\tau\), \(\mu\), \(\Sigma\), \(\alpha\), and \(\beta\) by maximizing the expected complete data log-likelihood.
Recall the OCBA rule1:
\[\begin{equation} \left( \frac{\beta_k}{\sigma_k} \right)^2 = \sum_{x \neq k} \left( \frac{\beta_x}{\sigma_x} \right)^2 \end{equation}\]
\[\begin{equation} \frac{(\mu_x - \mu_k)^2}{\frac{\sigma_x^2}{\beta_x} + \frac{\sigma_k^2}{\beta_k}} = \frac{(\mu_x' - \mu_k)^2}{\frac{\sigma_x'^2}{\beta_x'} + \frac{\sigma_k^2}{\beta_k}}, \quad \forall x, x' \neq k, \end{equation}\]
where \(\beta_1, \dots, \beta_k\) are proportions of the simulation budget allocated to each design.
Generate new simulations for designs:
In each iteration, we allocate a new batch of simulation budget across the k designs based on the current estimates.
The allocation can be determined using the OCBA allocation rule, which would plug in the posterior variances \(\mathbb{E}[\sigma^2_x]\) and the current estimates of the design means.
Streaming data update the estimation parameters, including \(\alpha\) and \(\beta\):
Suppose we have a workable streaming data update scheme (i.e., SAEM) for the estimation of the all parameters, we could incrementally update the parameters per simulation requirement.
We use the posterior variance \(\mathbb{E}[\sigma_x^2]\) and OCBA allocation rule to generate the next round of simulation budget and return to step 1.
MFBA to show its PCS guarantees.